Deciphering the explanatory potential of blood pressure variables on post-operative length of stay through hierarchical clustering: A retrospective monocentric study

Objective Mean arterial pressure is widely used as the variable to monitor during anesthesia. But there are many other variables proposed to define intraoperative arterial hypotension. The goal of the present study was to search arterial pressure variables linked with prolonged postoperative length of stay (pLOS). Design Retrospective cohort study of adult patients having received general anesthesia for a scheduled non-cardiac surgical procedure between 15th July 2017 and 31st December 2019. Methods pLOS was defined as a stay longer than the median (main outcome), adjusted for surgery type and duration. 330 arterial pressure variables were analyzed and organized through a clustering approach. An unsupervised hierarchical aggregation method for optimal cluster determination, employing Kendall’s tau coefficients and a penalized Bayes information criterion was used. Variables were ranked using the absolute standardized mean distance (aSMD) to measure their effect on pLOS. Finally, after multivariate independence analysis, the number of variables was reduced to three. Results Our study examined 9,516 patients. When LOS is defined as strictly greater than the median, 34% of patients experienced pLOS. Key arterial pressure variables linked with this definition of pLOS included the difference between the highest and lowest pulse pressure values computed throughout the surgery (aSMD[95%CI] = 0.39[0.31–0.40], p<0.001), the accumulated time pulse pressure above 61mmHg (aSMD = 0.21[0.17–0.25], p<0.001), and the lowest MAP during surgery (aSMD = 0.20[0.16–0.24], p<0.001). Conclusions By applying a clustering approach, three arterial pressure variables were associated with pLOS. This scalable method can be applied to various dichotomized outcomes.


Introduction
Intra-operative hypotension (IoH) is widely recognized as a contributing factor to various post-operative complications, including myocardial ischemia, acute kidney injury, and delirium [1,2].While numerous definitions of IoH exist [3], a consensus has emerged from the Perioperative Quality Initiative-3 workgroup.They propose a straightforward definition: a mean arterial pressure (MAP) falling below 60-70 mmHg is considered detrimental during non-cardiac surgery [4].
Most studies on IoH primarily focus on specific groups, such as vascular surgery patients, where postoperative complications are detectable through continuous monitoring and frequent biomarker analyses for myocardial ischemia or acute kidney injury.Transitioning from identifying a postoperative complication to measuring the postoperative hospital length of stay (LOS) offers significant benefits, as this outcome is universally accessible and easily obtained.Prolonged LOS (pLOS) may indicate severe postoperative complications, but nearly half of the cases can also be due to nonclinical reasons.Therefore, LOS should be viewed more as an indicator of the hospital process rather than solely as a reflection of comorbidities and care quality.This perspective encourages investigating the complex interplay between IoH, other blood pressure events, and their collective impact on hospital processes [1,2,4].
Our study aims to retrospectively analyze a single-center cohort, focusing specifically on the potential statistical relationship between certain intraoperative arterial pressure variables and pLOS.pLOS was defined as a length of stay exceeding median durations, which constitutes our primary objective.Secondary objectives were pLOS based on durations surpassing the 75th and 90th percentiles.
Hence, an automated method for selecting and aggregating relevant arterial pressure variables from non-invasive measurements of mean arterial (MAP), systolic (SAP), diastolic (DAP), and pulse (PP) pressures was introduced.This hypothesis extends to a comprehensive range of potential indicators, such as minimum values, variability, and cumulative time below a given threshold.We propose using a clustering approach followed by effect-size quantification to shed light on the link between intraoperative arterial pressure and pLOS.

Study design, ethics approval and setting
This retrospective study was managed in a tertiary academic private non-profit hospital located in a Paris suburb (France) where surgical activity is multi-purpose, excluding cardiac surgery, and which provides around 20,000 anesthetics a year (including for obstetrics and endoscopy).The study was approved by the local Ethics Committee (Chairperson, Professor Herve ´) on the 18th of December 2019 (n˚19-11-3).Patients were collectively informed (by means of posters) that their data could be used for research purposes, on condition that the data were anonymized.This information included the necessary information to enable them to refuse participation.As a result of this procedure, the need for consent was waived by the Ethics Committee.Data were accessed for research purposes from 28th January 2020.Authors had no access to information that could identify individual participants during or after data collection.

Patient population
Analysis concerned all patients aged 18 years or older who had general anesthesia between 15th July 2017 and 31st December 2019 and stayed in hospital for at least one night.Patients were excluded if operating time was less than 20 min, if they had anesthesia more than once during the same hospitalization, and if they had an obstetric surgical procedure, lung transplantation, interventional radiology, and gastro-intestinal endoscopy and bronchoscopy.Patients with no recorded arterial pressure signal or with aberrant or incomplete signal values were also excluded.All patients were managed according to usual recommendations, especially regarding intraoperative monitoring.

Data collection
Patient characteristics and preoperative medications were collected from Cesare™, a computerized software for preoperative anesthetic evaluation (Bow Me ´dical, 80440 Boves, France).Centricity Anesthesia software was used to collect intraoperative variables (GE Healthcare, 78 530 Buc, France).LOS and in-hospital mortality were obtained by questioning the health data warehouse.
Arterial pressure measurements were obtained from three time-series: diastolic (DAP), systolic (SAP), and mean arterial pressure (MAP), which were imported from a.csv file.The pulse pressure signal (PP = SAP-DAP) was computed after pairing DAP and SAP using a Cantor pairing function, with time and patient ID as inputs.A Hampel filter, set to five median absolute deviations and constructed over all arterial pressure values, was employed to remove artifact pressure values, which were treated as missing data.Time series were excluded when the interval between two measurements exceeded 10 minutes due to missing values unless these points were at the beginning or end of the recording.The remaining traces were imputed for missing values using a loess smoother with a 40% span parameter.Invasive and non-invasive arterial pressure measurements were combined, we adjusted the continuous blood pressure values to match the sampling frequency typical of non-invasive measurements using interpolation over a 30-second uniformly spaced time vector with a piecewise cubic Hermite interpolating polynomial (pchip).
All relevant data for the paper is publicly accessible in the Dryad Digital Repository (doi: 10.5061/dryad.12jm63z5r).

Outcomes
Outcome was pLOS defined as the number of days between the date of the surgical procedure and discharge, with a stay strictly longer than median (main outcome), 75th, and 90th percentile (secondary outcomes) used to delineate pLOS.This definition was surgery dependent, indeed to account for variances in pLOS due to surgery complexity and severity.Patients were grouped by surgical class: digestive, thoracic, gynecological, neurosurgical, otorhinolaryngological, urological, and vascular.Each surgery class was further divided into quartiles based on intervention duration, leading to 28 sub-groups.The aim was to reduce the confounding impact of surgery duration, considering its potential correlation with surgery severity and thus pLOS.When pLOS was characterized based on median stay, if the median and third quartile (Q3) were equal, pLOS was defined as a stay longer than the median + one day.This occurs in subclasses where the distribution is tightly clustered.Patients who died post-surgery were categorized as pLOS.Label 1 to pLOS patients and 0 otherwise was assigned (Fig 1).

Statistical analyses
In analyzing arterial pressure recordings, variables from MAP, SAP, DAP, and PP using a consistent methodology were calculated.This included intra-operative minimum, maximum, mean, median, standard deviation, and variability, with the latter defined as the standard deviation divided by the mean (Fig 2A).'Drop' variables were defined as the differences between the maximum and minimum values taken throughout the entire anesthesia.For instance, Drop PP refers to the maximal difference of PP observed during the intervention.Additionally, the cumulative time and area below various thresholds were calculated, chosen from every millimeter of mercury between the 5th and 85th percentiles.For each threshold, the cumulative time and area spent below it were scaled and both variables relative to the total intervention time was determined.For instance, CumTimePP>61mmHg designates the duration during the entire surgery the variable PP was higher than 61mmHg.This resulted in 330 features, although the area under the curve was discarded as it was overly redundant with the cumulative time spent and would only add noise and complexity in interpreting the results (S1 Table ).The first step was clustering variables and selection of the best candidate within a cluster.The purpose of clustering variables is to group them based on their correlation in an unsupervised manner, independent of the pLOS outcome.A correlation matrix using Kendall's tau for each pair of variables was constructed, employing the native correlation function from the Python v1.4.3 pandas library [5] (Fig 2B).We used Kendall's tau method as it is appropriate for non-parametric distribution and handles ties in ranks.The absolute values of each entry in this matrix were calculated to obtain a measure of the strength of the relationship between variables.This matrix was used to perform a hierarchical agglomeration of the variables using Ward distance.Hierarchical agglomeration groups features into clusters progressively by merging them based on similarity, measured by Kendall's tau. Ward distance is used to minimize variance within each cluster.This agglomeration process can be visualized as a It is important to note that this agglomeration required us to define a level in the hierarchy that would correspond to a certain number of clusters.The linkage function from the Scipy v1.9.0 Python library was utilized for this step [6].
A criterion for selecting the best variable within a given cluster, using effect size, was first established.Within each cluster, variables were ranked according to the absolute standardized mean difference (aSMD), which measures the effect size of a variable between patients with and without pLOS.The aSMD is calculated as jm 1 À m 2 j s effect , where σ effect is the standard deviation in the effect group, and μ 1 −μ 2 is the mean difference between the two groups [7].This method is model-free, maintains interpretability compared to methods such as principal component analysis (label-free) or linear discriminant analysis (label-dependent), and unlike ROC AUC, aSMD distributes more linearly and does not require model fine-tuning.Within each cluster, the aSMD was computed for each variable, ranked them, and retained the one with the largest effect size.Finally, as noted earlier, the number of clusters 'k' is a free parameter.A process to select 'k' using the best variables from different clusters was developed (Fig 2D) and evaluated a likelihood-based performance metric for different 'k' values ranging from 1 to 10 clusters.A given 'k' corresponded to k clusters and thus k best variables.These variables were used as predictors in a multivariable logistic regression model of pLOS.From this model, the Bayesian information criterion (BIC) was calculated, which balances model performance (max log likelihood L) and complexity (number of clusters k), and accounts for sample size (N): BIC = k ln(N) -2 ln (L) [8].A penalty equal to ln(N) times the number of insignificant p-values (p�0.05) in the logistic model was also added to also include variable significance in the model selection process.This penalized BIC thus allows to achieve a balanced model-tradeoff between performance and complexity-while having significant variables.The model, and therefore the number of clusters, with the lowest modified BIC was ultimately selected (Fig 2E and 2F).After identifying the number of clusters using the modified BIC from the logistic model, only the independent variables (with p<0.05) for the final variable selection were retained (Fig 2G).Confidence intervals were estimated from a 1000 iteration bootstrap, and BIC was computed using the statsmodels.apiv 0.13.2Python library [9].
To calculate the sample size, a significance level of 0.05 and a power of 80% (beta = 0.2) was set.Using Hsieh's method [10], a binary outcome (pLOS) with the primary covariate having an odds ratio (OR) of 1.2 was anticipated.A non-informed scenario for prevalence with P0 set at 0.5 was assumed.Given the robust hierarchical interaction among the arterial pressure variables, the sample size for multiple variables was adjusted by applying a correction based on a high squared multiple correlation coefficient 0.95.This adjustment led to a required sample size of 4,740 patients.Note that we took advantage of the larger dataset to refine our machine learning model.
All statistical analyses were conducted using a significant threshold of α<0.05.Ordinal variables were presented as median and interquartile range (IQR), while categorical variables were displayed as count and percentage.Appropriate Fisher or Mann-Whitney tests were used to compare covariates.For logistic models, pLOS was modelled as a random variable from a binomial distribution: pLOS * Binomial (1, ~), where the probability p was derived from the logistic model [11].To assess a linear change in odds, variables used for odd ratio (OR) estimation were inspected using the R package Hmisc [12].For instance, a cumulative time with a pulse pressure (PP) above 61mmHg (CumTimePP >61mmHg) exceeding 50 minutes was marked as 1, and 0 otherwise.Drop PP�25mmHg was set to 25mmHg, 1std = 14.9mmHg.Statistical analyses were performed using R: descriptive statistics and univariate tests, including aSMD, were computed using the CreateTableOne library [13], and forest plots of logistic regression were derived from glm functions.

Arterial pressure variables linked to pLOS, based on the 75th and 90th percentile of LOS
The same unsupervised clustering method to identify critical arterial pressure variables when pLOS by lengths of stay exceeding the 75th (pLOS75) and 90th (pLOS90) percentiles was used.Of the patients, 1808 (19%) and 832(9%) had a pLOS75 and pLOS90, respectively.For pLOS75, the key associated variables were Drop PP (aSMD = 0.32[0.27,0.38]) and Min MAP (aSMD = 0.17[0.16,0.23]).After accounting for the two variables in a multivariable logistic regression, Min MAP and Drop PP were found independent.For pLOS90, associated with the longest LOS, only the Drop PP variable remained significant (aSMD = 0.35[0.27,0.44]).Despite changes in the pLOS outcome, the best variable for each cluster remained consistent, highlighting the robustness of these variables.

Discussion
Our study presents a data driven method involving a clustering process and an effect-size based comparison to establish a statistical relationship between a comprehensive set of arterial pressure variables and pLOS.This approach excluded the use of clinical characteristics which are known to be predictors of post-operative morbidity and mortality.By analyzing a monocentric dataset of 9,516 patients, larger pulse pressure fluctuations (Drop PP), lower minimum MAP (Min MAP), and a shorter time spent above 61mmHg of PP (CumTime PP>61mmHg) were independently associated with increased pLOS risk.The analysis distilled four clusters from 330 variables, yielding three independent variables.Interestingly, Drop PP remained associated with the 75th and 90th longest LOS.The suggested approach allowed us to analyze a group of variables that are highly interrelated, grouping them into clusters that represent families of related variables.This could facilitate a deeper understanding of the connections between various studies using different arterial pressure variables, thereby enhancing the comparability and interpretation of their findings.Vernooij and colleagues conducted systematic research, exploring a variety of definitions for IoH [14].Although each definition had its own relevance, the sheer number made it challenging to establish a unique, clear IoH definition.Additionally, selecting an IoH definition often intertwines with the specific clinical outcome under investigation, such as acute kidney injury or pLOS [3].With numerous potential factors contributing to IoH, creating a comprehensive overview becomes a considerable challenge.
This study used hierarchical clustering: an unsupervised algorithm independent of the pLOS outcome.Unlike the conventional application of this method to patient data, we focused on variables, grouping them based on observed correlation.This approach enabled us to examine closely related variables such as the cumulative time MAP remained below 65mmHg and 70mmHg (S2 Table ).Our analysis could help understanding why low systolic and mean arterial pressure variables, which both belong to cluster 3, have been reported to have similar impacts on clinical outcomes like acute kidney injury, stroke, or myocardial injury [2].In contrast, cluster 1, which reflects variability in the arterial pressure, likely bears distinct characteristics from cluster 3, find echoes in the literature, for instance with findings from Hirsch et al., who identified arterial pressure fluctuation, rather than hypotension, as a risk factor for postoperative delirium [1].In essence, we suggest that opting for the most representative variable from each cluster, rather than amalgamating them, could more accurately unveil the effect of an arterial pressure variable group on a specific outcome.
Our method uniquely identifies key predictors within a candidate pool by capitalizing on the data's hierarchical nature and high collinearity.This approach differs from logistic regression models, which often struggle with collinear variables [2,3] and can lead to a skewed understanding of their roles [15].By focusing on clusters, we broaden our understanding and avoid arbitrary variable selection.It also stands apart from techniques like LASSO or Elastic Net, which may arbitrarily select 'better' candidates amidst high collinearity [16].In addition, absolute aSMD was utilized for single arterial pressure variable selection within each cluster.This step, influenced by clinical outcomes, allowed effective effect size ranking, reflecting effect strength, unlike p-values from two-sample tests that solely reject a null hypothesis [17].Despite aSMD's inability to account for nonlinearities, it was preferred over metrics like ROC AUC, given its advantages in speed, simplicity, and being model-free, avoiding issues such as model selection and hyper-parameter tuning [15,18].

Strengths and limitations
This study has notable strengths.Firstly, the large patient cohort allowed for the inclusion of various surgical types, enhancing the robustness of the analysis.The inclusion and non-inclusion criteria used define a fairly homogeneous patient population.Another strength is that the outcome, pLOS, is easy to find and has no missing data.The proposed method offers a dual perspective: it not only identifies key variables but also uncovers families of variables, potentially bridging diverse research efforts focusing on similar clinical outcomes but different biomarkers.Focusing exclusively on blood pressure variables, this approach reveals clusters linked with pLOS.However, further studies should address the impact of confounding factors on pLOS.Interestingly, it is plausible that important features such as age or comorbidities, like hypertension, could be encompassed within certain clusters identified by this method, a work beyond the scope of this study.
However, this study has some limitations.
Its monocentric nature raises the potential for variations in arterial pressure distribution and pLOS definitions across different centers and countries due to distinct healthcare practices.For example, one might expect longer pLOS in France as compared to Anglo-American countries [19].Indeed, it is important to note that, in France, the joint responsibility of anesthesiologists and surgeons for postoperative patient management may impact the study's context.LOS represents an outcome that despite being easy to measure can potentially be influenced by factors that cannot be controlled in practice, including non-clinical factors or individual doctor policies.To universalize our findings, future research should quantify the effect variability across multiple centers and validate identified pLOS variables in different international contexts.
In constructing variables, absolute over relative BP values were favored due to ambiguity in defining baseline BP.Hence, our study does not explore the pLOS and relative BP values relationship, a widely considered factor in preventing intraoperative hypotension [4].
Our analysis primarily relies on non-invasive brachial cuff measurements, yielding poor temporal resolution, which could impact the accuracy of variables tied to arterial pressure variability, such as DropPP.While invasive catheter monitoring offers superior temporal resolution, its application is not in line with current medical practices and is reserved for specific interventions.Future research could explore digital photoplethysmography for improved temporal resolution, though its accuracy needs validation [20].
While this study statistically correlates three arterial pressure variables with pLOS, it cannot ascertain whether enhancing these factors will shorten the stay.To affirm the role of these variables, a prospective randomized clinical trial is needed.

Conclusion
In conclusion, we developed an approach identifying that intraoperative variability in PP, minimum MAP values, and PP duration above 61mmHg are associated with pLOS.Our clusterbased approach made it possible to handle a diverse set of collinear arterial pressure variables, including multiple IoH definitions.This technique reveals clusters of variables that could have similar clinical implications, as well as those that are likely independent.This scalable method could readily be applied to any dichotomized outcomes, such as mortality, acute kidney injury, or myocardial injury.We encourage other researchers to explore this evaluation method in various clinical settings, surgical groups, and patient populations (e.g., elderly, patients with hypertension).Future studies should aim to investigate additional variables and potentially assess the long-term impact of intraoperative management strategies based on these results.

Fig 1 .
Fig 1. Construction of prolonged length of stay.LOS: length of stay.Definition of pLOS outcome starts with seven surgery classes including (pLOS based on median duration): digestive, thoracic, gynecological, neurosurgical, ENT, urological, and vascular interventions.Each class is subdivided into four groups based on surgery duration quartiles.The median LOS and defined pLOS those with LOS > median LOS were computed for each subgroup.https://doi.org/10.1371/journal.pone.0308910.g001

Fig 2 .
Fig 2. Variables extraction and selection from arterial pressure signal (pLOS based on median duration).A: Non-invasive arterial pressure signals include MAP, SAP, DAP and PP that are pre-processed (artefact removal, imputation, interpolation) and from which a panel of variables were computed (ex.Mean, max, cumulative times under given thresholds).B: Dependence between variables was obtained by computing a correlation matrix based on Kendal tau values.C: The matrix was then used for hierarchical agglomeration using Ward method and represented as a dendrogram.D: At this stage, the number of clusters could be arbitrarily chosen modulating a given level of agglomeration.Within each cluster, variables were sorted by aSMD and the best candidate (red dot) were kept.E: To determine the number of clusters to consider, the best variable per cluster was used in a multivariable logistic regression model of pLOS outcome.The number of clusters with the smallest modified Bayesian information criterion mBIC = BIC + (poor p-val) ln(N) were then kept.F: Four clusters represented by the best variables were obtained: smaller pulse pressure values computed over the entire intervention (Drop PP), lowest MAP reached during the intervention (Min MAP), cumulative time PP spent below 61mmHg (CumTimePP>61), and the cumulative time SAP spent below 132mmHg.G: Finally, the independence between variables was tested, resulting in the SAP-related variables being dropped.https://doi.org/10.1371/journal.pone.0308910.g002